Accelerating Deep Neural Networks for Efficient Scene Understanding in Multi-Modal Automotive Applications
نویسندگان
چکیده
Environment perception constitutes one of the most critical operations performed by semi-and fully- autonomous vehicles. In recent years, Deep Neural Networks (DNNs) have become standard tool for solutions owing to their impressive capabilities in analyzing and modelling complex dynamic scenes, from (often muti-modal) sensory inputs. However, well-established performance DNNs comes at cost increased time storage complexity, which may problematic automotive systems due requirement a short prediction horizon (as many cases inference must be real time) limited computational, storage, energy resources mobile systems. A common way addressing this problem is transform original large pre-trained networks into new smaller models, utilizing Model Compression Acceleration (MCA) techniques, improving both execution efficiency. Within MCA framework, paper, we investigate application two state-of-the-art weight-sharing namely Vector Quantization (VQ) Dictionary Learning (DL) one, as well novel extensions, towards acceleration compression widely used 2D 3D object-detection applications. Apart individual (uni-modal) networks, also present evaluate multi-modal late-fusion algorithm combining detection results detectors. Our evaluation studies are carried out on KITTI Dataset. The obtained lend themselves twofold interpretation. On hand, they showcase significant gains that can achieved via weight sharing selected DNN detectors, with accuracy loss, highlight differences between utilized approaches. other, demonstrate substantial boost outcome unimodal using proposed based approach. Indeed, our experiments shown, pairing high-performance DL-based technique loss-mitigating effect fusion approach, leads highly accelerated models (up approximately 2.5× 6× respectively) loss fused ranging within single-digits figures low around 1% class “cars”).
منابع مشابه
Multi-Modal Scene Understanding for Robotic Grasping
Current robotics research is largely driven by the vision of creating an intelligent being that can perform dangerous, difficult or unpopular tasks. These can for example be exploring the surface of planet mars or the bottom of the ocean, maintaining a furnace or assembling a car. They can also be more mundane such as cleaning an apartment or fetching groceries. This vision has been pursued sin...
متن کاملHigh-order Deep Neural Networks for Learning Multi-Modal Representations
In multi-modal learning, data consists of multiple modalities, which need to be represented jointly to capture the real-world ’concept’ that the data corresponds to (Srivastava & Salakhutdinov, 2012). However, it is not easy to obtain the joint representations reflecting the structure of multi-modal data with machine learning algorithms, especially with conventional neural networks. This is bec...
متن کاملMulti-modal scene understanding using probabilistic models
In order to understand the contribution of this thesis, the positioning and the limitations of the solved problems must be known. Therefore, an overview of the research directions concerning the integration of speech/NL and image processing is given and some basic principles of automatic speech understanding and computer vision as separate modalities are presented. Finally, the scope of the the...
متن کاملEfficient Convolutional Patch Networks for Scene Understanding
In this paper, we present convolutional patch networks, which are convolutional (neural) networks (CNN) learned to distinguish different image patches and which can be used for pixel-wise labeling. We show how to easily learn spatial priors for certain categories jointly with their appearance. Experiments for urban scene understanding demonstrate state-of-the-art results on the LabelMeFacade da...
متن کاملMulti-modal Geolocation Estimation Using Deep Neural Networks
Estimating the location where an image was taken based solely on the contents of the image is a challenging task, even for humans, as properly labeling an image in such a fashion relies heavily on contextual information, and is not as simple as identifying a single object in the image. Thus any methods which attempt to do so must somehow account for these complexities, and no single model to da...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2023
ISSN: ['2169-3536']
DOI: https://doi.org/10.1109/access.2023.3258400